⚡️ Speed up function _byte_to_line_index by 39% in PR #1199 (omni-java)#1611
Open
codeflash-ai[bot] wants to merge 2 commits intoomni-javafrom
Open
⚡️ Speed up function _byte_to_line_index by 39% in PR #1199 (omni-java)#1611codeflash-ai[bot] wants to merge 2 commits intoomni-javafrom
_byte_to_line_index by 39% in PR #1199 (omni-java)#1611codeflash-ai[bot] wants to merge 2 commits intoomni-javafrom
Conversation
The optimized code achieves a **39% runtime improvement** through two key micro-optimizations that reduce per-call overhead in this frequently-executed helper function: ## Primary Optimizations 1. **Direct import binding**: Changed from `bisect.bisect_right()` to importing `bisect_right` directly as `_bisect_right`. This eliminates the attribute lookup (`bisect.`) on every function call, saving ~90-100ns per invocation as shown in the line profiler (977304ns → 887516ns for the bisect line). 2. **Conditional expression over max()**: Replaced `max(0, idx)` with `idx if idx > 0 else 0`. This avoids the overhead of calling the built-in `max()` function with tuple packing/unpacking, reducing this line's execution time by ~40% (622046ns → 379545ns per the profiler). ## Why This Matters The function maps byte offsets to line indices using binary search, a core operation that happens **2,158 times** in the profiled workload. These micro-optimizations compound significantly: - **Test results show consistent 30-70% speedups** across all cases, with the most dramatic improvements (60-70%) occurring in edge cases like empty lists or single elements where the overhead of `max()` represents a larger proportion of total execution time - **Large-scale tests** (1000-line files with multiple queries) still achieve 27-43% improvements, demonstrating the optimization scales well - The optimization is particularly effective for **hot-path scenarios** like sequential offset queries (42.6% faster) and dense line mapping operations The changes preserve all behavior including edge case handling (negative indices, empty lists) while delivering substantial performance gains through elimination of unnecessary Python-level function call overhead.
Contributor
PR Review SummaryPrek ChecksFixed 3 issues (auto-fixed by ruff):
All prek checks now pass. Mypy19 pre-existing errors in Code ReviewNo critical issues found. The PR makes a single micro-optimization:
Test Coverage
Last updated: 2026-02-20 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #1199
If you approve this dependent PR, these changes will be merged into the original PR branch
omni-java.📄 39% (0.39x) speedup for
_byte_to_line_indexincodeflash/languages/java/instrumentation.py⏱️ Runtime :
924 microseconds→663 microseconds(best of163runs)📝 Explanation and details
The optimized code achieves a 39% runtime improvement through two key micro-optimizations that reduce per-call overhead in this frequently-executed helper function:
Primary Optimizations
Direct import binding: Changed from
bisect.bisect_right()to importingbisect_rightdirectly as_bisect_right. This eliminates the attribute lookup (bisect.) on every function call, saving ~90-100ns per invocation as shown in the line profiler (977304ns → 887516ns for the bisect line).Conditional expression over max(): Replaced
max(0, idx)withidx if idx > 0 else 0. This avoids the overhead of calling the built-inmax()function with tuple packing/unpacking, reducing this line's execution time by ~40% (622046ns → 379545ns per the profiler).Why This Matters
The function maps byte offsets to line indices using binary search, a core operation that happens 2,158 times in the profiled workload. These micro-optimizations compound significantly:
max()represents a larger proportion of total execution timeThe changes preserve all behavior including edge case handling (negative indices, empty lists) while delivering substantial performance gains through elimination of unnecessary Python-level function call overhead.
✅ Correctness verification report:
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-pr1199-2026-02-20T14.22.56and push.